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Abstract 

We describe a new programming idiom for concurrency, based on 
Applicative Functors, where concurrency is implicit in the Applica- 
tive <*> operator. The result is that concurrent programs can be 
written in a natural applicative style, and they retain a high degree 
of clarity and modularity while executing with maximal concur- 
rency. This idiom is particularly useful for programming against 
external data sources, where the application code is written without 
the use of explicit concurrency constructs, while the implementa- 
tion is able to batch together multiple requests for data from the 
same source, and fetch data from multiple sources concurrently. 
Our abstraction uses a cache to ensure that multiple requests for 
the same data return the same result, which frees the programmer 
from having to arrange to fetch data only once, which in turn leads 
to greater modularity. 

While it is generally applicable, our technique was designed 
with a particular application in mind: an internal service at Face- 
book that identifies particular types of content and takes actions 
based on it. Our application has a large body of business logic that 
fetches data from several different external sources. The framework 
described in this paper enables the business logic to execute ef- 
ficiently by automatically fetching data concurrently; we present 
some preliminary results. 

Keywords Haskell; concurrency; applicative; monad; data-fetching; 
distributed 

1. Introduction 

Consider the problem of building a network service that encap- 
sulates business logic behind an API; a special case of this being 
a web-based application. Services of this kind often need to effi- 
ciently obtain and process data from a heterogeneous set of external 
sources. In the case of a web application, the service usually needs 
to access at least databases, and possibly other application-specific 
services that make up the distributed architecture of the system. 

The business logic in this setting is the code that determines, 
for each request made using this service, what data to deliver as 
the result. In the case of a web application, the input is an HTTP 
request, and the output is a web page. Our goal is to have clear and 
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concise business logic, uncluttered by performance-related details. 
In particular the programmer should not need to be concerned 
with accessing external data efficiently. However, one particular 
problem often arises that creates a tension between conciseness and 
efficiency in this setting: accessing multiple remote data sources 
efficiently requires concurrency, and that normally requires the 
programmer to intervene and program the concurrency explicitly. 

When the business logic is only concerned with reading data 
from external sources and not writing, the programmer doesn't 
care about the order in which data accesses happen, since there 
are no side-effects that could make the result different when the 
order changes. So in this case the programmer would be entirely 
happy with not having to specify either ordering or concurrency, 
and letting the system perform data access in the most efficient way 
possible. In this paper we present an embedded domain-specific 
language (EDSL), written in Haskell, that facilitates this style of 
programming, while automatically extracting and exploiting any 
concurrency inherent in the program. 

Our contributions can be summarised as follows: 

• We present an Applicative abstraction that allows implicit 
concurrency to be extracted from computations written with a 
combination of Monad and Applicative. This is an extension 
of the idea of concurrency monads 1 10], using Applicative <*> 
as a way to introduce concurrency (Section]?}. We then develop 
the idea into an abstraction that supports concurrent access to 
remote data (Section[5|, and failure (Section[SJ. 

• We show how to add a cache to the framework (Section |6). 
The cache memoises the results of previous data fetches, which 
provides not only performance benefits, but also consistency in 
the face of changes to the external data. 

• We show that it isn't necessary for the programmer to use 
Applicative operators in order to benefit from concurrency 
in our framework, for two reasons: first, bulk monadic oper- 
ations such as maps and filters use Applicative internally, 
which provides a lot of the benefit of Applicative concur- 
rency for almost zero effort (Section [5.5[ ), and secondly we can 
automatically translate code written using monadic style into 
Applicative in certain cases (Section|7](. 

• We have implemented this system at Facebook in a back-end 
service that contains over 200,000 lines of business logic. We 
present some preliminary results showing that our system run- 
ning with production data efficiently optimises the data ac- 
cesses. When running without our automatic concurrency, typ- 
ical latencies were 51% longer (Section|9j. 

While our work is mostly focused on a setting in which all 
the operations of the DSL are data reads, we consider how to 
incorporate side-effecting operations in Section |9.3| Section [10] 
compares our design with other concurrent programming models. 



2. Motivation 

To motivate the design, we will present two use cases. The first is 
a typical web application, which needs to render a web page based 
on data fetched from one or more external sources. The second is 
a real-world use case from Facebook: a rule-engine for detecting 
certain types of content and taking actions based on it. 

2.1 Example: rendering a blog 

In this example we'll look at some code to render a blog, focusing 
on the part of the application that fetches and processes the data 
from the external data source (e.g. a database). The blog web page 
will consist of two panes: 

• The main pane shows the most recent posts to the blog in date 
order. 

• The side pane contains two sub-panes: 

■ a list of the posts with the most page views ("popular 
posts"), 

■ a list of topics and the number of posts in each topic. 

Assuming a set of operations to fetch the necessary data, and a 
set of functions to actually render the HTML, the task is to write the 
code to collect the necessary data and call the rendering functions 
for each of the separate parts of the page. The goal is to write code 
that has two properties: 

• It should be modular, so that new sections on the page can be 
added and removed without disturbing the rest of the code. 

• It should execute efficiently, but without the programmer having 
to implement optimisations manually. In particular, we should 
be fetching as much data concurrently as possible. 

Our framework allows both of these goals to be met; the code 
will be both maximally modular and maximally efficient (in terms 
of overlapping and batching external requests for data). 

The example requires a bit of setup. First, some types: 

data Postld — identifies a post 

data Date — a calendar date 

data PostContent - the content of a post 

data Postlnfo = Postlnfo 
{ postld : : Postld 
, postDate : : Date 
, postTopic : : String 
} 

A post on the blog is represented by two types: Postlnfo and 
PostContent. Postlnfo contains the metadata about the post: the 
date it was created, and its topic. The actual content of the post is 
represented by the abstract PostContent type. 

Posts have an identifier that allows them to be fetched from the 
database, namely Postld. For the purposes of this example we will 
assume the simplest storage model possible: the storage performs 
no computation at all, so all sorting, joining, and so forth must be 
done by the client. 

Our computation will be done in a monad called Fetch. The 
implementation of Fetch will be given later, but for this example 
all we need to know is that Fetch has instances of Monad, Functor 
and Applicative, and has the following operations for fetching 
data: 

getPostlds : : Fetch [Postld] 

getPostlnfo : : Postld -» Fetch Postlnfo 
getPostContent : : Postld -» Fetch PostContent 
getPostViews : : Postld -» Fetch Int 



getPostlds returns the identifiers of all the posts, getPostlnfo 
retrieves the metadata about a particular post, getPostContent 
fetches the content of a post, and finally getPostViews returns 
a count of the number of page views for a post. Each of these 
operations needs to retrieve the data from some external source, 
perhaps one or more databases. Furthermore a database might be 
highly distributed, so there is no expectation that any two requests 
will be served by the same machine. 

We assume a set of rendering functions, including. 

renderPosts :: [(Postlnfo, PostContent)] -» Html 
renderPage : : Html -» Html -» Html 

renderPosts takes a set of posts and returns the corresponding 
HTML. Note that we need both the Postlnfo and the PostContent 
to render a post. The renderPage function constructs the whole 
page given the HTML for the side pane and the main pane. We'll 
see various other functions beginning with render; the implemen- 
tations of these functions aren't important for the example. 

Now that the background is set, we can move on to the actual 
code of the example. We'll start at the top and work down; here is 
the top-level function, blog: 

blog : : Fetch Html 

blog = renderPage <$> leftPane <*> mainPane 

blog generates a web page by calling leftPane and mainPane to 
generate the two panes, and then calling renderPage to put the re- 
sults together. Note that we're using the Applicative combinators 
<$> and <*> to construct the expression: leftPane and mainPane 
are both Fetch operations because they will need to fetch data. 

To make the main pane, we need to fetch all the information 
about the posts, sort them into date order, and then take the first 
few (say 5) to pass to renderPosts: 

mainPane : : Fetch Html 
mainPane = do 

posts <- getAHPostsInf o 

let ordered = 
take 5 $ 

sortBy (flip (comparing postDate)) posts 
content <- mapM (getPostContent . postld) ordered 
return $ renderPosts (zip ordered content) 

Here getAHPostsInf o is an auxiliary function, defined as 
follows: 

getAHPostsInfo : : Fetch [Postlnfo] 
getAHPostsInf o = mapM getPostlnfo =<< getPostlds 

As you might expect, to fetch all the Postlnfos we have to 
first fetch all the Postlds with getPostlds, and then fetch each 
Postlnfo with getPostlnfo. 

The left pane consists of two sub-panes, so in order to construct 
the left pane we must render the sub-panes and put the result 
together by calling another rendering function, renderSidePane: 

leftPane : : Fetch Html 

leftPane = renderSidePane <$> popularPosts <*> topics 

Next we'll look at the popularPosts sub-pane. In order to 
define this we'll need an auxiliary function, getPostDetails, 
which fetches both the Postlnfo and the PostContent for a post: 

getPostDetails : : Postld 

-» Fetch (Postlnfo, PostContent) 
getPostDetails pid = 
(,) <$> getPostlnfo pid <*> getPostContent pid 



Here is the code for popularPosts: 

popularPosts : : Fetch Html 
popularPosts = do 
pids <- getPostlds 
views t- mapM getPostViews pids 
let ordered = 

take 5 $ map fst $ 
sortBy (flip (comparing snd) ) 
(zip pids views) 
content <- mapM getPostDetails ordered 
return $ renderPostList content 

First we get the list of Postlds, and then the number of page views 
for each of these. The number of page views are used to sort the 
list; the value ordered is a list of the top five Postlds by page 
views. We can use this list to fetch the information about the posts 
that we need to render, by calling getPostDetails for each one, 
and finally the result is passed to renderPostList to render the 
list of popular posts. 

Next the code for rendering the menu of topics: 

topics : : Fetch Html 
topics = do 

posts «- getAHPostsInf o 
let topiccounts = 

Map . f romListWith (+) 

[ (postTopic p, 1) I p <- posts ] 
return $ renderTopics topiccounts 

Creating the list of topics is a matter of calculating a mapping 
from topic to the number of posts in that topic from the list of 
Postlnf os, and then passing that to renderTopics to render it. 

This completes the code for the example. The code clearly 
expresses the functionality of the application, with no concession 
to performance. Yet we want it to execute efficiently too; there are 
two ways in which our framework will automatically improve the 
efficiency when this code is executed: 

• Concurrency. A lot of the data fetching can be done concur- 
rently. For example: 

■ every time we use mapM with a data-fetching operation, 
there is an opportunity for concurrency. 

■ we can compute mainPane and lef tPane at the same time, 
and within lef tPane we can compute popularPosts and 
topics at the same time. 

Our goal is to exploit all this inherent concurrency without 
the programmer having to lift a finger. The framework we 
will describe in this paper does exactly that: with the code as 
written, the data will be fetched in the pattern shown in Figure[T| 
The dotted lines indicate a round of data-fetching, where all 
the items in a round are fetched concurrently. There are three 
rounds: 

■ getPostlds (needed by all three panes) 

■ getPostlnfo for all posts (needed by mainPane and 
topics), and getPostViews for all posts (needed by 
popularPosts). 

■ getPostContent for each of the posts displayed in the 
main pane, and getPostlnfo and getPostContent for 
each of the posts displayed in popularPosts. 

• Caching. We made no explicit attempt to fetch each piece of 
data only once. For example, we are calling getPostlds three 
times. Remember the goal is to be modular: there is no global 
knowledge about what data is needed by each part of the page. 
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Figure 1. Data fetching in the blog example 



Furthermore, even though we could reasonably predict that we 
need getPostlds in several places and so do it once up front, it 
is much harder to predict which getPostContent calls will be 
made: the main pane displays the five most recent posts, and the 
side pane displays the five most popular posts. There might well 
be overlap between these two sets, but to write the code to fetch 
the minimal set of PostContent would require destroying the 
modularity. 

Our system uses caching to avoid fetching the same data mul- 
tiple times, which lets the programmer keep the modularity in 
their code without worrying about duplicate data fetching. Fur- 
thermore, as we describe in Section [6] caching has important 
benefits beyond the obvious performance gains. 

2.2 Example: a data-rich DSL 

Our second use-case is a service inside the Facebook infrastructure 
that identifies spam, malware, and other types of undesirable con- 
tent II II . Every action that creates an item of content on the site 
results in a request to this service, and it is the job of the service to 
return a result indicating whether the content should be allowed or 
rejectee^ The service runs on many machines, and each instance of 
the service runs the same set of business logic, which is typically 
modified many times per day. 

As an example of the kind of calculations that our business logic 
needs to perform, consider this hypothetical expression fragment: 

length (intersect (friendsOf x) (friendsOf y) ) 

length is the usual list length operation, intersect takes the 
intersection of two lists, and friendsOf is a function that returns 
the list of friends of a user: 

friendsOf : : Userld -» [Userld] 



1 This is a huge simplification, but will suffice for this paper. 



The value of this expression is the number of friends that x and 
y have in common; this value tends to be a useful quantity in our 
business logic and is often computed. 

This code fragment is an example of how we would like the 
business logic to look: clear, concise, and without any mention of 
implementation details. 

Now, the f riendsOf function needs to access a remote database 
in order to return its result. So if we were to implement this directly 
in Haskell, even if we hide the remote data access behind a pure 
API like f riendsOf , when we run the program it will make two 
requests for data in series: first to fetch the friends of x, and then 
to fetch the friends of y. We ought to do far better than this: not 
only could we do these two requests concurrently, but in fact the 
database serving these requests (TAO, [14|) supports submitting 
several requests as a single batch, so we could submit both requests 
in a single unit. 

The question is, how could we modify our language such that 
it supports an implementation that submits these two requests con- 
currently? The problem is not just one of exploring simple expres- 
sions like this; in general we might have to wait for the results of 
some data accesses before we can evaluate more of the expression. 
Consider this: 

let 

numCommonFriends = 

length (intersect (f riendsOf x) (f riendsOf y) ) 

in 

if numCommonFriends < 2 && daysRegistered x < 30 
then . . . 
else . . . 

Here daysRegistered returns the number of days that a user has 
been registered on the site. 

So now, assuming that we want a lazy && such that if the left 
side is False we don't evaluate the right side at all, then we cannot 
fetch the data for daysRegistered until we have the results of the 
two f riendsOf calls. 

Scaling this up, when we consider computing the result of a 
request that involves running a large amount of business logic, in 
general at any given time there might be many requests for data 
that could be submitted concurrently. Having fetched the results 
of those requests, the computation can proceed further, possibly 
along multiple paths, until it gets blocked again on a new set of 
data fetches. 

Our solution is to build an abstraction using Applicative and 
Monad to support concurrent data access, which we describe in the 
next few sections. We will return in Section [531 to see how our DSL 
looks when built on top of the framework. 

3. Concurrency monads 

A concurrency monad embodies the fundamental notion of a com- 
putation that can pause and be resumed. The concurrency monad 
will be the foundation of the abstractions we develop in this paper. 
Here is its type: 

data Fetch a = Done a I Blocked (Fetch a) 

An operation of type Fetch a has either completed and deliv- 
ered a value a, indicated by Done, or it is blocked (or paused), in- 
dicated by Blocked. The argument to Blocked is the computation 
to run to continue, of type Fetch a. 

For reference, we give the definitions of the Functor and 
Monad type classes in Figure [2] The instances of Functor and 
Monad for Fetch are as follows: 



class Functor f where 

fmap :: (a-»b)-»fa-*fb 

class Functor f => Applicative f where 
pure : : a -» f a 
(<*>) ::f(a-»b)-»fa-»fb 

class Monad f where 
return : : a -» f a 
(»=) ::fa->(a->fb)->fb 

ap : : (Monad m) => m (a -* b) -* m a -» m b 
ap mf mx = do f <- mf ; x <- mx ; return (f x) 



Figure 2. Definitions of Functor, Applicative, Monad, and ap 



instance Functor Fetch where 
fmap f (Done x) = Done (f x) 
fmap f (Blocked c) = Blocked (fmap f c) 

instance Monad Fetch where 
return = Done 

Done a »= k = k a 

Blocked c »= k = Blocked (c »= k) 

In general, a computation in this monad will be a sequence of 
Blocked constructors ending in a Done with the return value. This 
is the essence of (cooperative) concurrency: for example, one could 
implement a simple round-robin scheduler to interleave multiple 
tasks by keeping track of a queue of blocked tasks, running the task 
at the front of the queue until it blocks again, and then returning it 
to the end of the queue. 

Our monad isn't very useful yet. There are two key pieces 
missing: a way to introduce concurrency into a computation, and 
a way for a computation to say what data it is waiting for when 
it blocks. We will present these elaborations respectively in the 
next two sections. Following that, we will return to our motivating 
examples and show how the Fetch framework enables efficient and 
modular data-fetching. 

4. Applicative concurrency 

Concurrency monads have occurred in the literature several times. 
Scholz 1 10 1 originally introduced a concurrency monad based on a 
continuation monad, and then Claessen |2| used this as the basis for 
his Poor Man's Concurrency Monad. This idea was used by Li and 
Zdancewic 1 5 1 to implement scalable network services. A slightly 
different formulation but with similar functionality was dubbed 
the resumption monad by Harrison |4|. The resumption monad 
formulation was used in describing the semantics of concurrency 
by Swierstra and Altenkirch [12]. Our Fetch monad follows the 
resumption monad formulation. It is also worth noting that this idea 
is an instance of a free monad [Tj. 

All these previous formulations of concurrency monads used 
some kind of fork operation to explicitly indicate when to create 
a new thread of control. In contrast, in this paper there is no fork. 
The concurrency will be implicit in the structure of the computa- 
tions we write using this abstraction. To make it possible to build 
computations that contain implicit concurrency, we need to make 
Fetch an Applicative Functor |7|. For reference, the definition of 
the Applicative class is given in Figure [2] (omitting the *> and 
<* operators, which are not important for this paper). 

Applicative Functors are a class of functors that may have ef- 
fects that compose using the <*> operator. Morally, the class of Ap- 



plicative Functors sits between Functors and Monads: every Monad 
is an Applicative Functor, but the reverse is not true. For historical 
reasons, Applicative is not currently a superclass of Monad in 
Haskell, although this is expected to change in the future. 

An Applicative instance can be given for any Monad, simply 
by making pure = return and <*> = ap (Figure |2j. However, 
for Fetch we want a custom Applicative instance that takes 
advantage of the fact that the arguments to <*> are independent, 
and uses this to introduce concurrency: 

instance Applicative Fetch where 
pure = return 

Done g <*> Done y = Done (g y) 
Done g <*> Blocked c = Blocked (g <$> c) 
Blocked c <*> Done y = Blocked (c <*> Done y) 
Blocked c <*> Blocked d = Blocked (c <*> d) 

This is the key piece of our design: when computations in Fetch 
are composed using the <*> operator, both arguments of <*> can 
be explored to search for Blocked computations, which creates the 
possibility that a computation may be blocked on multiple things 
simultaneously. This is in contrast to the monadic bind operator, 
»=, which does not admit exploration of both arguments, because 
the right hand side cannot be evaluated without the result from the 
left. 

For comparison, if we used <*> = ap, the standard definition 
for a Monad, we would get the following (refactored slightly): 

instance Applicative Fetch where 
pure = return 

Done f <*> x = f <$> x 

Blocked c <*> x = Blocked (c <*> x) 

Note how only the first argument of <*> is inspected. The differ- 
ence between these two will become clear if we consider an ex- 
ample: Blocked (Done (+1)) <*> Blocked (Done 1). Un- 
der our Applicative instance this evaluates to: 

Blocked (Done (+1) <*> Done 1) 

Blocked (Done (1 + 1)) 

whereas under the standard Applicative instance, the same ex- 
ample would evaluate to: 

Blocked (Done (+1) <*> Blocked (Done 1)) 

Blocked ((+1) <$> Blocked (Done 1)) 

Blocked (Blocked ((+1) <$> Done 1)) 
==> 

Blocked (Blocked (Done (1 + 1))) 

If Blocked indicates a set of remote data fetches that must be 
performed (we'll see how this happens in the next section), then 
with our Applicative instance we only have to stop and fetch data 
once, whereas the standard instance has two layers of Blocked, so 
we would stop twice. 

Now that we have established the basic idea, we need to elabo- 
rate it to do something useful; namely to perform multiple requests 
for data simultaneously. 

5. Fetching data 

In order to fetch some data, we need a primitive that takes a 
description of the data to fetch, and returns the data itself. We will 
call this operation dataFetch: 



dataFetch : : Request a -» Fetch a 

where Request is an application-specific type that specifies re- 
quests; a value of type Request a is an instruction that the system 
can use to fetch a value of type a. For now the Request type is a 
concrete but unspecified type; we will show how to instantiate this 
for our blog example in Section [Jl2| and we outline how to abstract 
the framework over the request type in Section|9] 

How can we implement dataFetch? One idea is to elaborate 
the Blocked constructor to include a request: 

data Fetch a 
= Done a 

I forall r . Blocked (Request r) (r -» Fetch a) 

This works for a single request, but quickly runs into trouble 
when we want to block on multiple requests because it becomes 
hard to maintain the connections between multiple result types r 
and their continuations. 

We solve this problem by storing results in mutable refer- 
ences. This requires two changes. First we encapsulate the re- 
quest and the place to store the result in an existentially quantified 
BlockedRequest type: 

data BlockedRequest = 

forall a . BlockedRequest (Request a) 

(IORef (FetchStatus a)) 

(a forall outside the constructor definition is Haskell's syntax 
for an existentially-quantified type variable). IORef is Haskell's 
mutable reference type, which supports the following operations 
for creation, reading and writing respectively: 

newIORef : : a -» 10 (IORef a) 
readlORef : : IORef a -» 10 a 
writelORef : : IORef a -» a -» 10 () 

The FetchStatus type is defined as follows: 

data FetchStatus a 
= NotFetched 
I FetchSuccess a 

Before the result is available, the IORef contains NotFetched. 
When the result is available, it contains FetchSuccess. As we will 
see later, using an IORef here also makes it easier to add caching 
to the framework. 

The use of IORef requires that we layer our monad on top of the 
10 monad. In practice this isn't a drawback, because the 10 monad 
is necessary in order to perform the actual data fetching, so it will 
be available when executing a computation in the Fetch monad 
anyway. The 10 monad will not be exposed to user code. 

Considering that we will need computations that can block on 
multiple requests, our monad now also needs to collect the set 
of BlockedRequest associated with a blocked computation. A 
list would work for this purpose, but it suffers from performance 
problems due to nested appends, so instead we will use Haskell's 
Seq type, which supports logarithmic-time append. 

With these two modifications (adding 10 and attaching Seq 
BlockedRequest to the Blocked constructor), the monad now 
looks like this: 

data Result a 
= Done a 

I Blocked (Seq BlockedRequest) (Fetch a) 
newtype Fetch a = Fetch { unFetch : : 10 (Result a) } 



instance Applicative Fetch where 
pure = return 



Fetch f <*> Fetch x = Fetch $ do 
f <- f 
x' <- x 

case (f',x') of 

(Done g, Done y ) -» 

(Done g, Blocked br c ) -» 

(Blocked br c, Done y ) -» 

(Blocked brl c, Blocked br2 d) -» 



return (Done (g y) ) 

return (Blocked br (g <$> c)) 

return (Blocked br (c <*> return y) ) 

return (Blocked (brl <> br2) (c <*> d)) 



Figure 3. Applicative instance for Fetch 



instance Monad Fetch where 

return a = Fetch $ return (Done a) 

Fetch m »= k = Fetch $ do 
r <- m 
case r of 

Done a -t unFetch (k a) 

Blocked br c -» return (Blocked br (c »= k)) 

and the Applicative instance is given in Figure[3] Note that in the 
case where both arguments to <*> are Blocked, we must combine 
the sets of blocked requests from each side. 

Finally we are in a position to implement dataFetch: 

dataFetch : : Request a -» Fetch a 
dataFetch request = Fetch $ do 

box <- newIQRef NotFetched - (1) 

let br = BlockedRequest request box — (2) 

let cont = Fetch $ do - (3) 

FetchSuccess a •- readlORef box — (4) 

return (Done a) — (5) 

return (Blocked (singleton br) cont) — (6) 

Where: 

• Line 1 creates a new IORef to store the result, initially contain- 
ing NotFetched. 

• Line 2 creates a BlockedRequest for this request. 

• Lines 3-5 define the continuation, which reads the result from 
the IORef and returns it in the monad. Note that the contents 
of the IORef is assumed to be FetchSuccess a when the 
continuation is executed. It is an internal error of the framework 
if this is not true, so we don't attempt to handle the error 
condition here. 

• Line 6: dataFetch returns Blocked in the monad, including 
the BlockedRequest. 

5.1 Running a computation 

We've defined the Fetch type and its Monad and Applicative 
instances, but we also need a way to run a Fetch computation. 
Clearly the details of how we actually fetch data are application- 
specific, but there's a standard pattern for running a computation 
that works in all settings. 

The application-specific data-fetching can be abstracted as a 
function fetch: 

fetch : : [BlockedRequest] -> 10 () 

The job of fetch is to fill in the IORef in each BlockedRequest 
with the data fetched. Ideally, fetch will take full advantage of 
concurrency where possible, and will batch together requests for 



data from the same source. For example, multiple HTTP requests 
could be handled by a pool of connections where each connection 
processes a pipelined batch of requests. Our actual implementa- 
tion at Facebook has several data sources, corresponding to various 
internal services in the Facebook infrastructure. Most have asyn- 
chronous APIs but some are synchronous, and several of them sup- 
port batched requests. We can fetch data from all of them concur- 
rently. 

Given fetch, the basic scheme for running a Fetch computa- 
tion is as follows: 

runFetch : : Fetch a -» 10 a 
runFetch (Fetch h) = do 
r «- h 
case r of 

Done a -» return a 
Blocked br cont -» do 
fetch (toList br) 
runFetch cont 

This works as follows. First, we run the Fetch computation. If 
the result was Done, then we are finished; return the result. If the 
result was Blocked, then fetch the data by calling fetch, and then 
run the continuation from the Blocked constructor by recursively 
invoking runFetch. 

The overall effect is to run the computation in stages that we call 
rounds. In each round runFetch performs as much computation as 
possible and then performs all the data fetching concurrently. This 
process is repeated until the computation returns Done. 

By performing as much computation as possible we maximise 
the amount of data fetching we can perform concurrently. This 
makes good use of our network resources, by providing the maxi- 
mum chance that we can batch multiple requests to the same data 
source, but it might not be the optimal scheme from a latency per- 
spective; we consider alternatives in Section[TT| 

Our design does not impose a particular concurrency strategy on 
the data sources. The implementation of fetch has complete free- 
dom to use the most appropriate strategy for executing the requests 
it is given. Typically that will involve a combination of batching re- 
quests to individual data sources, and performing requests to mul- 
tiple data sources concurrently with each other using Haskell's ex- 
isting concurrency mechanisms. 

5.2 Example: blog 

In this section we will instantiate our framework for the blog ex- 
ample described in Section [2~T| and show how it delivers automatic 
concurrency. 

First, we need to define the Request type. Requests are parame- 
terised by their result type, and since there will be multiple requests 
with different result types, a Request must be a GADT [9|. Here 
is the Request type for our blog example: 



data Request a where 

FetchPosts : : Request [Postld] 

FetchPostlnf o : : Postld -» Request Postlnfo 
FetchPostContent : : Postld -» Request PostContent 
FetchPostViews : : Postld -» Request Int 

Next we need to provide implementations for the data-fetching 
operations (getPostlds etc.), which are simply calls to dataFetch 
passing the appropriate Request: 

getPostlds = dataFetch FetchPosts 

getPostlnfo = dataFetch . FetchPostlnf o 
getPostContent = dataFetch . FetchPostContent 
getPostViews = dataFetch . FetchPostViews 

Now, if we provide a dummy implementation of fetch that 
simulates a remote data source and prints out requests as they are 
mad^] we do indeed find that the requests are made in three rounds 
as described in Section [2~T) A real implementation of fetch would 
perform the requests in each round concurrently. 

5.3 Example: Haxl 

In Section |2.2| we introduced our motivation for designing the 
applicative concurrency abstraction. Our implementation is called 
Haxl, and we will describe it in more detail in Section |97fl Here, 
we briefly return to the original example to show how to implement 
it using Fetch. 

The example we used was this expression: 

length (intersect (friendsOf x) (friendsOf y) ) 

How does this look when used with our Fetch monad? Any oper- 
ation that may fetch data must be a Fetch operation, hence 

friendsOf : : Userld -» Fetch [Userld] 

while length and intersect are the usual pure functions. So to 
write the expression as a whole we need to lift the pure operations 
into the Applicative world, like so: 

length <$> intersect' (friendsOf x) (friendsOf y) 
where intersect' = liftA2 intersect 

This is just one way we could write it, there are many other equiv- 
alent alternatives. As we shall see in Section]?] it is also acceptable 
to use the plain do-notation, together with a source-to-source trans- 
formation that turns do-notation into Applicative operations: 

do a <- friendsOf x 
b <- friendsOf y 

return (length (intersect a b)) 
In fact, this is the style we advocate for users of our DSL. 

5.4 Semantics of Fet ch 

It's worth pondering on the implications of what we have done here. 
Arguably we broke the rules: while the Applicative laws do hold 
for Fetch, the documentation for Applicative also states that if a 
type is also a Monad, then its Applicative instance should satisfy 
pure = return and <*> = ap. This is clearly not the case for our 
Applicative instance. But in some sense, our intentions are pure: 
the goal is for code written using Applicative to execute more 
efficiently, not for it to give a different answer than when written 
using Monad. 

Our justification for this Applicative instance is based on 
more than its literal definition. We intend dataFetch to have 

2 Sample code is available at https://github.com/simonmar/ 
haxl- icf p!4- sample- code 



certain properties: it should not be observable to the programmer 
writing code using Fetch whether their dataFetch calls were 
performed concurrently or sequentially, or indeed in which order 
they were performed, the results should be the same. Therefore, 
dataFetch should not have any observable side-effects — all our 
requests must be read-only. To the user of Fetch it is as if the 
Applicative instance is the default <*> = ap, except that the 
code runs more efficiently, and for this to be the case we must 
restrict ourselves to read-only requests (although we return to this 
question and consider side-effects again in Section [93| >. 

Life is not quite that simple, however, since we are reading data 
from the outside world, and the data may change between calls to 
dataFetch. The programmer might be able to observe a change in 
the data and hence observe an ordering of dataFetch operations. 
Our approach is to close this loophole as far as we can: in Section[6] 
we add a cache to the system, which will ensure that identical 
requests always return the same result within a single run of Fetch. 
Technically we can argue that runFetch is in the 10 monad and 
therefore we are justified in making a non-deterministic choice for 
the ordering of dataFetch operations, but in practice we find that 
for the majority of applications this technicality is not important: 
we just write code as if we are working against a snapshot of the 
external data. 

If we actually did have access to an unchanging snapshot of the 
remote data, then we could make a strong claim of determinism for 
the programming model. Of course that's not generally possible 
when there are multiple data sources in use, although certain indi- 
vidual data sources do support access to a fixed snapshot of their 
data; one example is Datomic]^] 

5.5 Bulk operations: mapM and sequence 

In our example blog code we used the combinators mapM and 
sequence to perform bulk operations. As things stand in Haskell 
today, these functions are defined using monadic bind, for example 
sequence is defined in the Haskell 2010 Report as 

sequence : : Monad m => [ma] -* m [a] 
sequence = f oldr mcons (return [] ) 
where mcons pq = dox<-p; y *- q; return (x:y) 

Unfortunately, because this uses monadic bind rather than Ap- 
plicative <*>, in our framework it will serialise the operations 
rather than perform them concurrently. Fortunately sequence 
doesn't require monadic bind; Applicative is sufficient |7|, and 
indeed the the Data. Traversable module provides an equivalent 
that uses Applicative: sequenceA. Similarly, traverse is the 
Applicative equivalent of mapM. Nevertheless, Haskell program- 
mers tend to be less familiar with the Applicative equivalents, so 
in our EDSL library we map sequence to sequenceA and mapM to 
traverse, so that client code can use these well-known operations 
and obtain automatic concurrency. 

In due course when Applicative is made a superclass of 
Monad, the Applicative versions of these functions will become 
the defaults, and our workaround can be removed without changing 
the client code or its performance. 

6. Adding a cache 

In Section [2~T] we identified two ways that the framework can pro- 
vide automatic performance benefits for the application. So far 
we have demonstrated the first, namely exploiting implicit concur- 
rency. In this section we turn our attention to the second: avoiding 
duplicate requests for data. 



http : //www . datomic . com/ 



The solution is not surprising, namely to add caching. However, 
as we shall see, the presence of a cache provides some rather nice 
benefits in addition to the obvious performance improvements. 

Recall that data is fetched using dataFetch: 

dataFetch : : Request a -» Fetch a 

Caching amounts to memoising this operation, such that the 
second time it is called with a request that has been previously 
issued, it returns the result from the original request. Not only do 
we gain performance by not repeating identical data-fetches, as 
mentioned in Section |5.4| the programmer can rely on identical 
requests returning the same results, which provides consistency 
within a single Fetch computation in the face of data that might 
be changing. 

We also gain the ability to do some source-to-source transfor- 
mations. For example, common subexpression elimination: 

do x <- N; M 

do x <- N; M [return x/N] 

Where M and N stand for arbitrary Fetch expressions, This holds 
provided dataFetch is the only way to do I/O in our framework, 
and all dataFetch requests are cached. 

6.1 Implementing the cache 

Let's consider how to add a cache to the system. In order to store a 
mapping from requests to results, we need the following API: 

data DataCache 

lookup : : Request a -» DataCache -» Maybe a 
insert : : Request a -» a -t DataCache -» DataCache 

If we want to use an existing efficient map implementation, we 
cannot implement this API directly because its type-correctness 
relies on the correctness of the map implementation, and the Eq 
and Ord instances for Request. But if we trust these, Haskell 
provides an unsafe back-door, unsaf eCoerce, that lets us convey 
this promise to the type system. The use of unsafe features to 
implement a purely functional API is common practice in Haskell; 
often the motivation is performance, but here it is the need to 
maintain a link between two types in the type system. 

A possible implementation is as follows: 

nexrtype DataCache = 

DataCache (forall a . HashMap (Request a) a) 

The contents of a DataCache is a mapping that, for all types a, 
maps things of type Request a to things of type a. The invariant 
we require is that a key of type Request a is either not present 
in the mapping, or maps to a value of type a. We will enforce the 
invariant when an element is inserted into the Map, and assume it 
when an element is extracted. If the Map is correctly implemented, 
then our assumption is valid. 

Note that we use a HashMap rather than a plain Map. This 
is because Map requires the key type to be an instance of the 
Ord class, but Ord cannot be defined for all Request a because 
it would entail comparing keys of different types. On the other 
hand, HashMap requires Eq and Hashable, both of which can 
be straightforwardly defined for Request a, the former using a 
standalone deriving declaration: 

deriving instance Eq (Request a) 

and the latter with a hand- written Hashable instance (see the 
sample cod^J. 

4 https : //github. com/simonmar/haxl-icf pl4- sample- code 



Looking up in the cache is simply a lookup in the Map: 

lookup : : Request a -» DataCache -» Maybe a 
lookup key (DataCache m) = Map. lookup key m 

This works because we have already declared that the Map in a 
DataCache works for all types a. The insert operation is where 
we have to make a promise to the type system: 

insert : : Request a -» a -t DataCache -» DataCache 
insert key val (DataCache m) = 

DataCache $ unsaf eCoerce (Map. insert key val m) 

We can insert a key/value pair into the Map without any difficulty. 
However, that results in a Map instantiated at a particular type a 
(the type of val passed to insert), so in order to get back a 
Map that works for any a we need to apply unsaf eCoerce. The 
unsaf eCoerce function has this type: 

unsaf eCoerce : : forall a b . a -» b 

Therefore, applying unsafeCoerce to the Map allows it to be 
generalised to the type required by DataCache. 

Now we have a cache that can store a type-safe mapping from 
requests to results. We will need to plumb this around the Monad to 
pass it to each call to dataFetch so that we can check the cache 
for a previous result. However, this won't be enough: consider what 
happens when we make two identical requests in the same round: 
there won't be a cached result, but nevertheless we want to ensure 
that we only make a single request and use the same result for 
both dataFetch calls. Indeed, this happens several times in our 
blog example: the first round issues three calls to getPostlds, for 
example. 

In dataFetch we need to distinguish three different cases: 

1. The request has not been encountered before: we need to create 
a BlockedRequest, and block. 

2. The request has already been fetched: we can return the cached 
result and continue. 

3. The request has been encountered in the current round but 
not yet fetched: we need to block, but not create a new 
BlockedRequest since it will already have been added to the 
set of requests to fetch elsewhere. 

The key idea is that in the third case we can share the IORef 
(FetchStatus a) from the BlockedRequest that was created 
the first time the request was encountered. Hence, all calls to 
dataFetch for a given request will automatically share the same 
result. How can we find the IORef for a request? We store it in the 
cache. 

So instead of storing only results in our DataCache, we need to 
store IORef (FetchStatus a) . This lets us distinguish the three 
cases above: 

1. The request is not in the DataCache. 

2. The request is in the DataCache, and the IORef contains 
FetchSuccess a. 

3. The request is in the DataCache, and the IORef contains 
NotFetched. 

This implies that we must add an item to the cache as soon as the 
request is issued; we don't wait until the result is available. Filling 
in the details, our DataCache now has the following API: 

data DataCache 

lookup : : Request a -» DataCache 

-» Maybe (IORef (FetchStatus a)) 



dataFetch : : Request a -» Fetch a 
dataFetch req = Fetch $ Aref -» do 
cache <- readlORef ref 
case lookup req cache of 
Nothing -» do 

box <- newIORef NotFetched 
writelORef ref (insert req box cache) 
let br = BlockedRequest req box 
return (Blocked (singleton br) (cont box)) 
Just box -» do 

r <- readlORef box 
case r of 

FetchSuccess result -> return (Done result) 
NotFetched -» return (Blocked Seq. empty (cont 

where 

cont box = Fetch $ Aref -» do 
FetchSuccess a <- readlORef box 
return (Done a) 



Figure 4. dataFetch implementation with caching 



insert : : Request a -» IORef (FetchStatus a) 
-» DataCache -t DataCache 

(the implementation is the same). The cache itself needs to be 
stored in an IORef and passed around in the monad; Fetch now 
has this definition: 

newtype Fetch a = Fetch { 

unFetch : : IORef DataCache -• 10 (Result a) } 

The alterations to the Monad and Applicative instances are 
straightforward, so we omit them here. 

The definition of dataFetch is given in Figure [4] The three 
cases identified earlier are dealt with in that order: 

1. If the request is not in the cache, then we create a new IORef 
for the result (initially containing NotFetched) and add that 
to the cache. Then we create a BlockedRequest, and return 
Blocked in the monad, with a continuation that will read the 
result from the IORef we created. 

2. If the request is in the cache, then we check the contents of 
the IORef. If it contains FetchSuccess result, then we have 
a cached result, and dataFetch returns Done immediately (it 
doesn't block). 

3. If the contents of the IORef is NotFetched, then we return 
Blocked, but with an empty set of BlockedRequests, and a 
continuation that will read the result from the IORef. 

6.2 Cache Persistence and Replaying 

Within a single runFetch, the cache only accumulates informa- 
tion, and never discards it. In the use-cases we have described, this 
is not a problem: requests to a network-based service typically take 
a short period of time to deliver the result, after which we can dis- 
card the cache. During a computation we don't want to discard any 
cached data, because the programmer might rely on the cache for 
consistency. 

We have found that the cache provides other benefits in addition 
to the ones already described: 

• at the end of a Fetch computation, the cache is a complete 
record of all the requests that were made, and the data that was 
fetched. Re-running the computation with the fully populated 
cache is guaranteed to give the same result, and will not fetch 



any data. So by persisting the cache, we can replay computa- 
tions for the purposes of fault diagnosis or profiling. When the 
external data is changing rapidly, being able to reliably repro- 
duce past executions is extremely valuable. 

• We can store things in the cache that are not technically re- 
mote data fetches, but nevertheless we want to have a single 
deterministic value for. For example, in our implementation we 
cache the current time: within a Fetch computation the current 
time is a constant. We can also memoise whole Fetch compu- 
tations by storing their results in the cache. 

7. Automatic Applicative 

^0-ur Fetch abstraction requires the programmer to use the opera- 
tions of Applicative in order to benefit from concurrency. While 
these operations are concise and expressive, many programmers are 
more comfortable with monadic notation and prefer to use it even 
when Applicative is available. Furthermore, we don't want to pe- 
nalise code that uses monadic style: it should be automatically con- 
current too. Our monad is commutative, so we are free to re-order 
operations at will, including replacing serial >>= with concurrent 
<*>. 

In general, the transformation we want to apply is this: 

do p <- A ; q<-B; ... 
==> {- if no variable of p is a free variable of B -} 
do (p,q) <- (,) <$> A <*> B 

for patterns p and q and expressions A and B. The transforma- 
tion can be applied recursively, so that long sequences of indepen- 
dent statements in do-notation can be automatically replaced by 
Applicative notation. 

At the time of writing, the transformation is proposed but not 
implemented in GHC; it is our intention to implement it as an 
optional extension (because it is not necessarily valid for every 
Applicative instance). In our Haxl implementation we currently 
apply this transformation as part of the automatic translation of our 
existing DSL into Haskell. 

8. Exceptions 

Handling failure is an important part of a framework that is de- 
signed to retrieve data from external sources. We have found that 
it is important for the application programmer to be able to han- 
dle failure, particularly transient failures that occur due to network 
problems or outages in external services. In these cases the pro- 
grammer typically wants to choose between having the whole com- 
putation fail, or substituting a conservative default value in place of 
the data requested. 

We need to consider failure in two ways: first, the way in which 
exceptions propagate in the monad, and second, how failure is 
handled at the data-fetching layer. We'll deal with these in order. 

8.1 Exceptions in Fetch 

First, we add explicit exception support to our monad. We need to 
add one constructor to the Result type, Throw, which represents a 
thrown exception: 

data Result a 
= Done a 

I Blocked (Seq BlockedRequest) (Fetch a) 
I Throw SomeException 

The SomeException type is from Haskell's Control .Exception 
library and represents an arbitrary exception |6|. To throw an ex- 
ception we need to convert it to a SomeException and return it 
with Throw: 



throw : : Exception e => e -» Fetch a 
throw e = Fetch $ \_ -• 

return (Throw (toException e)) 

The Monad instance for Fetch with the Throw constructor is as 
follows: 

instance Monad Fetch where 

return a = Fetch $ Aref -» return (Done a) 

Fetch m »= k = Fetch $ Aref -> do 
r <- m ref 
case r of 

Done a -» unFetch (k a) ref 

Blocked br c -* return (Blocked br (c »= k)) 

Throw e -t return (Throw e) 

and Figure [5]gives the Applicative instance. It is straightforward 
except for one case: in <*>, where the left side returns Blocked 
and the right side returns Throw, we must not propagate the excep- 
tion yet, and instead we must return a Blocked computation. The 
reason is that we don't yet know whether the left side will throw 
an exception when it becomes unblocked; if it does throw an ex- 
ception, then that is the exception that the computation as a whole 
should throw, and not the exception from the right argument of <*>. 
If we were to throw the exception from the right argument of <*> 
immediately, the result would be non-determinism: the exception 
that gets thrown depends on whether the left argument blocks. 
We also need a catch function: 



catch : : Exception e 
=> Fetch a -» (e 



-» Fetch a) -» Fetch 



catch (Fetch h) handler = Fetch $ Aref -» do 
r <- h ref 
case r of 

Done a -• return (Done a) 
Blocked br c -» 

return (Blocked br (catch c handler)) 
Throw e -» case f romException e of 
Just e' unFetch (handler e') ref 
Nothing -t return (Throw e) 

As with catch in the ID monad, our catch catches only ex- 
ceptions of the type expected by the handler (the second argument 
to catch). The function f romException returns Just e' if the 
exception can be coerced to the appropriate type, or Nothing oth- 
erwise. The interesting case from our perspective is the Blocked 
case, where we construct the continuation by wrapping a call to 
catch around the inner continuation. 

8.2 Exceptions in dataFetch 

When a failure occurs in a data fetching operation, it must be 
thrown as an exception to the caller of dataFetch. We need to pro- 
gram this propagation explicitly, because the data is being fetched 
in the top-level runFetch loop, outside the context of the Fetch 
computation that called dataFetch. 

We propagate an exception in the same way that we communi- 
cate the result of the data fetch: via the IORef that stores the result. 
So we modify the FetchStatus type to include the possibility that 
the fetch failed with an exception: 

data FetchStatus a 
= NotFetched 
I FetchSuccess a 
I FetchFailure SomeException 



and we also modify dataFetch to turn a FetchFailure into a 
Throw after the fetch has executed (these modifications are straight- 
forward, so we omit the code here). 

This is all the support we need for exceptions. There is one pit- 
fall: we found in our real implementation that some care is needed 
in the implementation of a data source to ensure that an exception 
is properly reported as a FetchFailure and not just thrown by the 
data source; the latter causes the whole Fetch computation to be 
aborted, since the exception is thrown during the call to fetch in 
runFetch. 

9. Implementation and evaluation 

The basics of our use-case at Facebook were introduced in Sec- 
tion [T2] Essentially it is a network-based service that is used to 
detect and eliminate spam, malware, and other undesirable content 
on Facebook. There are about 600 different kinds of request, all 
implemented by a body of Haskell code of approximately 200,000 
lines; this was automatically translated into Haskell from our pre- 
vious in-house DSL, FXL. 

The system can be viewed as a rule-engine, where rules are 
Fetch computations. Each request runs a large set of rules and 
aggregates the results from all the rules. Rules are often (but not 
always) short, and most of them fetch some external data. In our 
system we run all the rules for a request using sequence; this has 
the effect of executing all the rules concurrently. 

We will give an outline of our implementation in the next sec- 
tion, and then present some preliminary results. 

9.1 Implementation 

In the earlier description, the implementation of the Fetch monad 
depended on the Request type, because the monad carries around 
a DataCache that stores Requests, and the dataFetch operation 
takes a Request as an argument. This is straightforward but some- 
what inconvenient, because we want to have the flexibility to add 
new data sources in a modular way, without modifying a single 
shared Request type. Furthermore, we want to be able to build 
and test data sources independently of each other, and to test the 
framework against "mock" versions of the data sources that don't 
fetch data over the wire. 

To gain this flexibility, in our implementation we abstracted the 
core framework over the data sources and request types. Space 
limitations preclude a full description of this, but the basic idea 
is to use Haskell's Typeable class so that we can store requests of 
arbitrary type in the cache. The dataFetch operation has this type: 

dataFetch : : (DataSource req, Request req a) 
=> req a -t Fetch a 

where Request is a package of constraints including Typeable, 
and DataSource is defined like this: 

class DataSource req where 

fetch : : [BlockedFetch req] -» Perf ormFetch 



data Perf ormFetch 

= SyncFetch (10 ()) 
I AsyncFetch (10 () - 



10 0) 



A data source is coupled to the type of requests that it serves, so for 
each request type there must be an instance of DataSource that 
defines how those requests are fetched. The fetch method takes 
a list of BlockedRequests containing requests that belong to this 
data source (the BlockedRequest type is now parameterised by 
the type of the request that it contains). The job of fetch is to fetch 
the data for those requests; it can do that synchronously or asyn- 
chronously, indicated by the Perf ormFetch type. An AsyncFetch 



instance Applicative Fetch where 
pure = return 



Fetch f <*> Fetch x = Fetch $ Aref -» do 
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Figure 5. Applicative instance for Fetch with exceptions 



is a function that takes as an argument the 10 operation to perform 
while the data is being fetched. The idea is that when fetching data 
from multiple sources we wrap all the asynchronous fetches around 
a sequence of the synchronous fetches: 

scheduleFetches : : [Perf ormFetch] -» ID () 
scheduleFetches fetches = asyncs syncs 
where 

asyncs = f oldr ( . ) id [f I AsyncFetch f <- fetches] 
syncs = sequence, [io I SyncFetch io <- fetches] 

In our implementation, most data sources are asynchronous. 
Maximal concurrency is achieved when at most one data source 
in a given round is synchronous, which is the case for the vast ma- 
jority of our fetching rounds. When there are multiple synchronous 
data sources we could achieve more concurrency by using Haskell's 
own concurrency mechanisms; this is something we intend to ex- 
plore in the future. 

9.2 Results 

To evaluate how well our system exploits concurrency, we ran a 
random sample of 10,000 actual requests for a single common re- 
quest type. We measured the number of data fetches performed by 
each request (not including those that were served from the cache), 
the number of rounds (batches of fetches performed concurrently), 
and the total end-to-end processing time of each request. Figure[6] 
gives the results, in the form of histograms of the number of re- 
quests against fetches, rounds, and total time (latency). Note that 
the number of requests on the Y-axis is a log scale. In the histogram 
of fetches, the buckets are 5 wide, so for example the first bar rep- 
resents the number of requests with 10-15 data fetches (there were 
no requests that performed fewer than 10 fetches). The histogram 
of rounds has integral buckets, and the time histogram has buckets 
of 20ms. 

Figure [7] gives the 50 th (median), 95 th , and 99 th percentiles, 
and the maximum value, for each of fetches, rounds, and time. 
Note that the figures for each column were calculated by sorting 
the requests by fetches, rounds, and time respectively. It is not 
necessarily the case that the request that performed the maximum 
number of fetches is the same request that took the maximum 
number of rounds or the longest time. 

We can see that 95% of our 10,000 requests require at most 4 
rounds of fetching (median 3), 95% perform at most 27 data fetches 
(median 18), and 95% run in at most 26.3ms (median 9.5ms). There 
is a long tail, however, with some requests requiring more than 
2000 data fetches. A few requests took an inordinately long time 
to run (the longest was 2.2s), and this turned out to be because one 
particular data fetch to another service took a long time. 



The second table in Figure [7] shows for comparison what hap- 
pens when we disable concurrency — this was achieved by making 
(<*>) = ap, so that <*> no longer batches together the fetches 
from both of its arguments (caching was still enabled, however). 
We can see that the number of rounds is equal to the number of 
fetches, as expected. The experiments were run against production 
data, so there are minor differences in the number of fetches be- 
tween the two runs in Figure|7] but we can see that the effect on to- 
tal runtime is significant, increasing the median time for a request 
by 51%. One extreme example is the request that required 2793 
fetches, which increased from 220ms to 1.3s with concurrency dis- 
abled. Concurrency had no effect on the pathological data fetches, 
so the maximum time was unchanged at 2.2s. 

9.2.1 Discussion 

We have shown that the automatic concurrency provided by our 
framework has a sizeable impact on latency for requests in our 
system, but is it enough? Our existing FXL-based system performs 
similar data-fetching optimisations, but it does so using a special- 
purpose interpreter, whereas our Haskell version is implemented in 
libraries without modifying the language implementation. 

Our workload is primarily I/O bound, so although Haskell is 
far faster than FXL at raw compute workloads, this has little ef- 
fect on comparisons between our two systems. Thus we believe 
that executing data fetches concurrently is the most important fac- 
tor affecting performance, and if the Haskell system were less able 
to exploit concurrency that would hinder its performance in these 
benchmarks. At the time of writing we have only preliminary mea- 
surements, but performance of the two systems does appears to be 
broadly similar, and we have spent very little time optimising the 
Haskell system so far. 

It is also worth noting that the current workload is I/O bound 
partly because compute-heavy tasks have historically been of- 
floaded to C++ code rather than written in FXL, because using 
FXL would have been too slow. In the Haskell version of our sys- 
tem we have reimplemented some of this functionality natively in 
Haskell, because its performance is more than adequate for com- 
pute tasks, and the Haskell code is significantly cleaner and safer. 
We believe that being able to implement compute tasks directly in 
Haskell will empower the users of our DSL to solve problems that 
they couldn't previously solve without adding C++ primitives to 
the language implementation. 

9.3 Using Applicative Concurrency with Side-effects 

As described, our framework has no side-effects except for reading, 
for good reason: operations in Fetch may take place in any order 
(Section \5A) . However, side-effects are important. For example, a 
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Figure 6. Results 



web application needs to take actions based on user input, and it 
might need to generate some statistics that get stored. Our imple- 
mentation at Facebook has various side effects, including storing 
values in a separate memcache service, and incrementing shared 
counters. 

One safe way to perform side effects is to return them from 
runFetch, and perform them afterwards. Indeed, this is exactly the 
way that side effects are typically performed when using Software 
Transactional Memory (STM) in Haskell. 

Sometimes it is convenient to allow side-effects as part of the 
Fetch computation itself. This is fine as long as it is not possible 
to observe the side-effect with a Fetch operation, which would 
expose the ordering of operations to the user. But this is quite 
flexible: we can, for example, have a write-only instance of Fetch 
that allows write operations to benefit from concurrency (obviously, 
the cache is not necessary for this), or we can have side-effects that 
cannot be observed, such as accumulating statistics. 

10. Comparison and related work 

Probably the closest relatives to the Fetch framework are the 
family of async programming models that have been enjoying 
popularity recently in several languages: F# 1 13 1, C#, OCaml 1151 . 
Scala (3), and ClojurtQ 

A common trait of these programming models is that they are 
based on a concurrency-monad-like substrate; they behave like 
lightweight threads with cooperative scheduling. When a compu- 
tation is suspended, its continuation is saved and re-executed later. 
These frameworks are typically good for scheduling large numbers 
of concurrent I/O tasks, because they have lower overhead than the 
heavyweight threads of their parent languages. 

In contrast with the Fetch framework, the async style has an 
explicit fork operation, in the form of an asynchronous method call 
that returns an object that can later be queried for the result. For 
example, in C# a typical sequence looks like this: 

Task<int> a = getDataO ; 
int y = doSomethingElseO ; 
int x = await a; 

The goal in this pattern is to perform getData ( ) concurrently with 
doSomethingElseO. The effects of getDataO will be inter- 
leaved with those of doSomethingElseO, although the degree of 
non-determinism is tempered somewhat by the use of cooperative 
scheduling. 

Ignoring non-determinism, in our system this could be written 
do [x,y] <- sequence [getData ,doSomethingElse] 
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Figure 7. Summary results, with and without concurrency 



making it clear that getData and doSomethingElse are executed 
together. 

Explicit blocking (as in await above) is often shunned in these 
programming models; instead it is recommended to attach callback 
methods to the results, like this (in Scala): 

val future = getDataO ; 
future map(x => x + 1); 

This has the advantage that we don't have to block on the result 
of the future in order to operate on it, which allows the system 
to exploit more concurrency. However, the programming model is 
somewhat indirect; in our system, this would be written 

do x <- getData; return (x+1) 



5 http : //clojure . github . io/core . async/ 



Reactive programming models |8| add another dimension to 
asynchronous programming, where instead of a single result being 
returned, there is a stream of results. This is a separate problem 
space from the one we are addressing in this paper. 

11. Further work 

The method for taking advantage of concurrency described in Sec- 
tion|4]is fairly simplistic: we run as much computation as possible, 
and then perform all the data fetching concurrently, repeating these 
two steps as many times as necessary to complete the computation. 
There are two ways we could overlap the computation phase with 
the data-fetching phase: 

• As soon as we have the result of any data fetch, we can start 
running the corresponding blocked part(s) of the computation. 

• We might want to emit some requests before we have finished 
exploring the whole computation. This potentially reduces con- 
currency but might also reduce latency. 

We intend to investigate these in future work. 
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